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ABSTRACT 

In  this  paper,  we  report  our  experiments  in  the  TREC  2008 
Relevance  Feedback  Track.  Our  main  goal  is  to  study  a 
novel  problem  in  feedback,  i.e.,  optimization  of  the  balance 
of  the  query  and  feedback  information.  Intuitively,  if  we 
over-trust  the  feedback  information,  we  may  be  biased  to 
favor  a  particular  subset  of  relevant  documents,  but  under¬ 
trusting  it  would  not  take  advantage  of  feedback.  In  the  cur¬ 
rent  feedback  methods,  the  balance  is  usually  controlled  by 
some  parameter,  which  is  often  set  to  a  fixed  value  across  all 
the  queries  and  collections.  However,  due  to  the  difference 
in  queries  and  feedback  documents,  this  balance  parameter 
should  be  optimized  for  each  query  and  each  set  of  feedback 
documents. 

To  address  this  problem,  we  present  a  learning  approach 
to  adaptively  predict  the  balance  coefficient  (i.e.,  feedback 
coefficient).  First,  three  heuristics  are  proposed  to  char¬ 
acterize  the  relationships  between  feedback  coefficient  and 
other  measures,  including  discrimination  of  query,  discrimi¬ 
nation  of  feedback  documents,  and  divergence  between  the 
query  and  the  feedback  documents.  Then,  taking  these 
three  heuristics  as  a  road  map,  we  explore  a  number  of  fea¬ 
tures  and  combine  them  using  a  logistic  regression  model 
to  predict  the  feedback  coefficient.  Experiments  show  that 
our  adaptive  relevance  feedback  is  more  robust  and  effective 
than  the  regular  fixed-coefficient  relevance  feedback. 

1.  INTRODUCTION 

Among  many  techniques  for  improving  the  accuracy  of 
ad  hoc  information  retrieval,  relevance  feedback  is  arguably 
one  of  the  most  effective  techniques  and  has  been  shown 
to  be  effective  with  variety  of  retrieval  models  [7,  6,  8,  4, 
10].  In  the  vector  space  model,  feedback  is  usually  done 
with  the  Rocchio  algorithm,  which  forms  a  new  query  vec¬ 
tor  by  maximizing  its  similarity  to  relevant  documents  and 
minimizing  its  similarity  to  non-relevant  documents  [7].  The 
feedback  method  in  classical  probabilistic  models  is  to  select 
expanded  terms  primarily  based  on  Robertson/Sparck-Jones 
weight  [6].  In  the  recently  proposed  language  modeling  ap¬ 
proaches,  relevance  feedback  can  be  implemented  through 
estimating  a  query  language  model  [3,  10]  or  relevance  model 
[4]  through  exploiting  a  set  of  feedback  documents. 

All  these  existing  methods  show  that  combining  feedback 
information  with  the  original  query  typically  improves  the 
performance.  However,  we  need  to  carefully  balance  the 


query  and  feedback  information  because  if  we  over-trust  the 
feedback  information,  we  may  be  biased  to  favor  a  particular 
subset  of  relevant  documents,  but  under-trusting  it  would 
not  take  advantage  of  feedback.  In  the  current  feedback 
methods,  the  balance  is  usually  controlled  by  some  parame¬ 
ter,  which  is  often  set  to  a  fixed  value  across  all  the  queries 
and  collections.  However  due  to  the  difference  in  queries 
and  feedback  documents,  this  balance  parameter  presum¬ 
ably  should  be  optimized  for  each  query  and  each  set  of 
feedback  documents. 

As  far  as  we  know,  how  to  optimize  the  balance  of  the 
query  and  feedback  information  has  not  been  well  studied 
in  previous  work.  Thus,  in  our  work,  we  study  this  novel 
problem  in  relevance  feedback  and  propose  an  adaptive  feed¬ 
back  method  which  predicts  a  dynamic  balance  coefficient 
by  using  a  learning  approach.  Specifically,  we  estimate  a 
potentially  different  feedback  coefficient  for  each  query  and 
each  set  of  feedback  documents,  rather  than  manually  set 
it  to  a  fixed  constant.  We  hypothesize  that  the  proposed 
method  will  do  better  than  the  current  fixed-coefficient  ap¬ 
proaches. 

We  explore  a  number  of  features  potentially  correlated 
with  the  feedback  coefficient  and  classified  them  into  three 
categories:  (1)  discrimination  of  query:  we  expect  that  the 
“clearer”  (i.e.,  more  discriminative)  the  query  is,  the  less 
feedback  we  need.  (2)  discrimination  of  feedback  docu¬ 
ments:  we  hypothesize  that  clearer  feedback  documents  can 
be  trusted  more.  (3)  divergence  between  the  query  and  the 
feedback  documents:  if  the  divergence  between  a  query  and 
its  feedback  documents  is  large,  it  means  that  the  query  does 
not  represent  relevant  documents  well,  thus  we  may  need  a 
larger  feedback  coefficient.  Following  these  three  heuristics, 
we  explored  a  number  of  features  and  combined  them  us¬ 
ing  a  logistic  regression  model  [2]  to  predict  the  feedback 
coefficient. 

Through  preliminary  experiments,  we  observe  that,  al¬ 
though  a  well-tuned  fixed  coefficient  is  not  optimal  for  many 
queries,  it  provides  a  “safe”  coefficient  range.  Compared 
with  it,  our  predicted  value  is  sometimes  too  extreme  and 
thus  “risky.”  So  we  also  experimented  with  some  strate¬ 
gies  to  smooth  our  prediction  using  the  safe  fixed  coefficient 
value.  We  hypothesize  that,  with  smoothing,  our  adaptive 
relevance  feedback  method  would  be  more  robust. 

In  our  experiments,  the  basic  retrieval  method  is  the  KL- 
divergence  retrieval  model  [3]  with  the  Dirichlet  smooth¬ 
ing  method  [9]  plus  a  generative  mixture  model  feedback 
method  [10],  which  adopts  our  predicted  feedback  coeffi¬ 
cient.  Our  proposed  method  has  shown  clear  improvements 
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ill  our  experiments  over  a  robust  fixed-coefficient  relevance 
feedback  method;  it  is  also  observed  that  most  features  that 
we  explore  help  predict  the  feedback  coefficient.  Through 
further  analysis,  we  find  that  our  adaptive  relevance  feed¬ 
back  approach  is  still  robust  and  effective  even  if  training 
and  testing  data  sets  are  inconsistent. 

In  the  rest  of  this  paper,  we  will  first  introduce  our  basic 
retrieval  method  in  Section  2.  After  that,  we  will  present 
the  adaptive  relevance  feedback  method  in  Section  3.  In 
Section  4,  we  will  describe  how  to  smooth  the  prediction 
value  to  make  relevance  feedback  more  robust.  We  report 
our  experimental  results  in  Section  5  and  conclude  our  work 
in  Section  6. 


2.  RETRIEVAL  METHOD 

To  make  our  algorithm  clear,  we  break  down  the  relevance 
feedback  task  into  four  steps:  initial  retrieval,  adaptive  feed¬ 
back  coefficient  prediction,  coefficient  smoothing,  and  query 
language  model  updating.  At  the  retrieval  step,  we  adopt 
the  KL-divergence  retrieval  model  with  Dirichlet  smoothing 
method  to  do  an  initial  retrieval,  based  on  which,  a  couple  of 
features  are  explored  to  predict  the  feedback  coefficient  using 
the  logistic  regression  model.  After  that,  several  strategies 
are  applied  to  smooth  our  prediction  value  using  a  tuned 
fixed  coefficient.  Finally,  the  smoothed  value  is  plugged  into 
the  mixture  model  relevance  feedback  method  to  update  the 
query  language  model. 

In  this  section,  we  present  our  basic  retrieval  approaches, 
the  KL-divergence  retrieval  model  and  the  mixture  model 
feedback  method. 


2.1  The  KL-Divergence  Retrieval  Model 

The  KL-divergence  retrieval  model  [3]  is  a  generalization 
of  the  query  likelihood  retrieval  method  proposed  in  [5]  and 
can  support  feedback  more  naturally  than  the  query  likeli¬ 
hood  method.  In  this  model,  all  the  queries  and  documents 
are  represented  by  unigram  language  models,  which  are  es¬ 
sentially  word  distributions.  Assuming  that  these  language 
models  can  be  appropriately  estimated,  the  KL-divergence 
retrieval  model  scores  a  document  D  with  respect  to  a  query 
Q  by  computing  the  negative  Kullback-Leibler  divergence 
between  the  query  language  model  8q  and  the  document 
language  model  do  as  follows: 

S{Q,D)  =  —D(6q\\9d)  =  -  £  PH0Q)log?P^\ 

Ztv  p(u>\6d) 


where  V  is  the  set  of  words  in  our  vocabulary.  Clearly,  the 
retrieval  performance  of  the  KL-divergence  would  depend 
on  how  we  estimate  the  document  model  6d  and  the  query 
model  9q.  The  document  model  9d  needs  to  be  smoothed 
and  an  effective  method  is  Dirichlet  smoothing  [9]: 


p(w\8D)  = 


c(w,  D)  +  pp(w\C) 
\D\  +  p 


where  p(w\C)  is  the  collection  language  model  and  is  esti¬ 
mated  with  p(w\C)  =  c)  ’  an<^  A4  is  a  smoothing  pa¬ 

rameter  and  is  usually  set  empirically.  Across  all  of  our 
experiments,  we  used  the  Dirichlet  prior  smoothing  method 
for  estimating  document  language  models. 

The  query  model  intuitively  captures  what  the  user  is  in¬ 
terested  in,  thus  would  affect  retrieval  accuracy  significantly. 
Without  feedback,  8q  is  often  estimated  as  p(w\ 9q)  =  p(w|Q)  = 


,  where  c(w,Q)  is  the  count  of  word  w  in  the  query 
Q,  and  \Q\  is  the  total  number  of  words  in  the  query. 

2.2  The  Mixture  Model  Feedback  Method 

The  query  model  described  above,  however,  is  not  very 
discriminative  because  a  query  is  typically  extremely  short. 
Several  different  methods  have  been  proposed  to  improve  the 
estimation  of  9q  by  exploiting  documents,  especially  those 
documents  that  are  used  for  relevance  feedback  or  pseudo¬ 
relevance  feedback  [3,  4,  10].  In  [10],  it  was  proposed  that 
feedback  can  be  implemented  in  the  KL-divergence  retrieval 
model  as  updating  the  query  model  based  on  the  feedback 
documents.  Specifically,  we  can  define  a  two-component 
mixture  model  (i.e.,  a  fixed  background  language  model 
p(w\C)  estimated  using  the  whole  collection  and  an  unknown 
topic  language  model  to  be  estimated)  and  assume  that 
the  feedback  documents  are  generated  using  such  a  mixture 
model.  Formally,  let  9t  be  the  unknown  topic  language 
model  and  T  C  C  be  a  set  of  feedback  documents.  The 
log-likelihood  function  of  the  mixture  model  is: 

L{T\9t)  =  EE  c(w,  D )  log[(l  -  X)p(w\9t)  +  \p(w\C)] 

DeT  wev 

where  A  is  a  mixture  noise  parameter  which  controls  the 
weight  of  the  background  model.  Given  a  fixed  A  (A  =  0.9 
in  our  experiments),  a  standard  EM  algorithm  can  then  be 
used  to  estimate  parameters  p(ic|#T),  which  is  then  inter¬ 
polated  with  the  original  query  model  p(w\Q)  to  obtain  an 
improved  estimation  of  the  query  model: 

p(w\9q)  =  (1  -  a)p(w\Q)  +  ap{w\9T) 

where  a  is  the  feedback  coefficient.  Similarly  to  other  ex¬ 
isting  feedback  methods  [7,  6,  8],  the  parameter  a  in  this 
formula  is  generally  fixed  across  all  queries  and  documents. 

However,  due  to  the  difference  in  queries  and  feedback 
documents,  the  coefficient  a,  which  indicates  the  balance 
between  query  and  feedback,  should  be  optimized  for  each 
query  and  each  set  of  feedback  documents.  This  motivates 
us  to  study  how  to  optimize  the  balance  of  the  query  and 
feedback  information.  We  view  this  problem  as  a  prediction 
problem  and  propose  a  learning  approach  to  solve  it.  Al¬ 
though  we  explore  this  idea  in  the  context  of  the  mixture 
model  feedback  method  in  this  paper,  it  could  be  applica¬ 
ble  to  other  feedback  methods  as  well.  We  now  present  our 
method. 


3.  FEEDBACK  COEFFICIENT  PREDICTION 

3.1  Heuristics  and  Features 

In  this  work,  we  investigate  three  heuristics  to  predict  the 
feedback  coefficient:  discrimination  of  query,  discrimination 
of  feedback  documents,  and  divergence  between  the  query 
and  the  feedback  documents.  The  three  heuristics  capture 
intrinsic  characteristics  of  the  two  main  components  (i.e. 
query  and  feedback  document  set)  and  the  relationship  be¬ 
tween  these  two  components  in  a  feedback  process.  We  ar¬ 
gue,  and  then  show  experimentally  in  Section  5,  that  the 
three  heuristics  all  play  important  roles  in  predicting  the 
feedback  coefficient.  Possibly,  many  other  features  can  be 
explored  by  taking  the  three  heuristics  as  a  road  map. 

3.1.1  Discrimination  of  Query 


Intuitively,  if  the  query  itself  is  discriminative  enough,  we 
do  not  need  to  rely  heavily  on  feedback  documents.  Hence, 
we  expect  the  discrimination  of  query  is  correlated  with  the 
feedback  coefficient.  Several  measures  are  proposed  to  quan¬ 
tify  it. 

(1)  Query  Length:  Intuitively,  for  two  queries  Q 1  and 
Q2,  if  Q 1  is  longer  than  Q 2  (i.e.  Q 1  has  more  terms  than 
Q 2),  Q 1  is  usually  discriminative  than  Q 2.  Therefore,  the 
query  length  could  be  a  characteristic  of  the  discrimination 
of  a  query.  To  capture  this  intuition,  we  introduce  query 
length  |Q|  as  our  first  feature.  Formally,  it  is  defined  as: 

\Q\:  the  number  of  terms  in  query  Q. 

(2)  Entropy  of  Query:  It  is  known  that  more  entropy 
means  more  randomness  and  less  discrimination.  Therefore, 
we  could  adopt  such  a  concept  to  measure  how  discrimina¬ 
tive  a  query  is.  To  compute  the  entropy,  we  need  to  estimate 
the  query  language  model  first,  which,  however,  involves 
again  an  interpolation  between  the  original  query  model  9q 
and  the  pseudo  feedback  document  model  9fi  as  well  as  the 
setting  of  a  feedback  coefficient.  (Note  that  we  have  used 
a  slightly  different  notation  9F  for  relevance  feedback  docu¬ 
ment  model,  and  throughout  this  paper  we  estimate  pseudo 
feedback  models  by  using  the  top  50  documents.)  To  avoid 
this  problem,  in  this  paper,  we  do  not  estimate  an  entropy 
for  the  interpolated  query  model  directly,  instead,  two  en¬ 
tropy  scores  respectively  for  6q  and  9fi  are  computed,  allow¬ 
ing  the  training  system  to  weigh  them,  which  is  expected  to 
get  an  appropriate  approximation.  Assume  that  each  query 
term  only  appears  once  in  a  query,  the  entropy  of  9q  is  de¬ 
fined  as: 

QEnt-Al  =  Y  ~p(w\9q)  log2  p(w\9q)  =  log2  \Q\ 
weQ 

Where  9q  is  estimated  as  p(w\6q)  =  We  can 

see  that  QEnt-Al  is  a  negative  logarithm  transformation  of 
query  length  |Q|.  Thus  in  effect,  we  just  have  another  query 
length  feature. 

Similarly,  we  defined  the  entropy  of  9fi  as  follows: 

QEnt-A2  =  ^2  ~P(.w\^F')^og2p(w\9Fi) 

w£F' 

where  p(w\9F>)  is  estimated  as  p(w\9Fi)  =  f')  • 

(3)  Relative  Entropy  of  Query:  In  the  definition 
above,  query  entropy  is  affected  significantly  by  common 
terms  (e.g.,  ‘the’,  ‘and’,  ...).  This  problem  can  be  addressed 
by  using  a  mixture  model  to  separate  the  topic  model  from 
the  background  model  [10].  However,  they  both  are  quite 
time-consuming.  So,  we  adopt  a  similar  idea  of  “relative 
entropy  of  query”  as  proposed  in  [1]  to  compute  the  query 
clarity  score,  which  measures  the  coherence  of  the  language 
usage  in  query  language  models  as  compared  to  the  collec¬ 
tion  model.  The  “query  clarity”  has  been  shown  an  intrinsic 
feature  of  queries  and  has  an  important  impact  on  the  re¬ 
trieval  performance  [1].  Therefore,  we  expect  that  it  can 
also  predict  the  feedback  coefficient. 

In  the  definition,  the  clarity  of  a  query  is  the  Kullback- 
Leibler  divergence  of  the  query  model  from  the  collection 
model.  Similar  to  the  computation  of  query  entropy,  an 
important  role  in  this  definition  is  the  estimation  of  a  query 
model.  To  avoid  it,  we  use  the  same  strategy  by  computing 
two  clarity  scores  for  9q  and  9fi  respectively. 


To  further  reduce  the  effect  of  common  terms,  9fi  is  smoothed 
with  the  collection  language  model  using  Jelinek-Mercer  smooth¬ 
ing  method  with  a  A  of  0.7[9].  Following  [1],  we  define  rela¬ 
tive  entropy  QEnt_Rl  and  QEnt-R2  as  follows: 

QEnt-Rl  =  Y  pMflQ)log 

QEnt_R2  =  Y  pH9fi)\o g 

where  p(w\C)  is  the  collection  language  model. 

3.1.2  Discrimination  of  Feedback  Documents 

Intuitively,  if  feedback  documents  are  more  discrimina¬ 
tive,  it  means  that  they  focus  more  on  the  relevant  topic 
and  far  away  from  noise.  Therefore,  discriminative  feedback 
documents  can  be  trusted  more  in  the  feedback  process. 

(1)  Feedback  Length:  For  a  query  Q  and  two  possible 
relevant  judgment  sets  F\  and  F2.  if  F\  has  more  documents 
than  F2,  usually  Fi  contains  more  intensive  relevant  infor¬ 
mation  than  F2]  thus,  Fi  could  be  discriminative  than  F2 
in  describing  relevant  information.  Therefore,  the  number 
of  feedback  documents,  which  we  define  as  feedback  length, 
can  be  taken  as  a  characteristic  of  the  discrimination  of  feed¬ 
back  document  set.  Formally,  feedback  length  |E|  is  defined 
as  follows: 

| El:  the  number  of  documents  in  F. 

(2)  Entropy  of  Feedback  Documents:  Feedback  length, 
as  described  above,  captures  the  discrimination  of  feedback 
documents  on  the  document  level,  whereas  the  entropy  of 
feedback  documents,  which  measures  the  term  distribution, 
is  on  the  term  level.  Usually,  more  entropy  means  a  more 
random  term  distribution;  thus  it  is  not  clear  which  topic 
the  feedback  documents  talk  about.  Similarly  to  the  com¬ 
putation  of  query  entropy,  the  entropy  of  feedback  model  9F 
is  defined  as: 

FBEnt-A  =  E  -p(w\9F)\og2p(w\9F) 

weF 

where  p(w\9F)  is  estimated  as  p(w\9F)  =  YcfwF )■ 

(3)  Relative  Entropy  of  Feedback  Documents:  Sim¬ 
ilar  to  Query  Entropy  QEnt-A2,  the  computation  of  feed¬ 
back  document  entropy  FBEnt-A  is  also  affected  severely 
by  common  terms.  So,  we  follow  the  same  idea  to  smooth 
9f  using  Jelinek-Mercer  smoothing  method  and  then  com¬ 
pute  the  “relative  entropy  of  feedback  documents”  as  an 
alternative  feature,  which  is  defined  as  follows: 

FBEnt.R  =  Y  pHOf)  log 

3.1.3  Divergence  between  Query  and  Feedback  Doc¬ 
uments 

The  motivation  of  divergence  between  query  and  feedback 
documents  is  that,  we  have  to  rely  on  feedback  more,  if  the 
query  does  not  represent  relevant  information  well  (i.e.,  the 
divergence  between  the  query  and  its  feedback  documents  is 
large.)  Below,  we  list  two  measures  to  quantify  it. 

(1)  Absolute  Divergence: 


A  direct  and  intuitive  way  to  estimate  the  divergence  is 
computing  the  divergence  between  query  model  Oq  and  feed¬ 
back  model  Of  using  the  KL-divergence  formula.  It  is  clear 
that  Of  is  easily  estimated  using  the  Maximum  Likelihood 
estimator:  p(w\0F),  =  ■ 

We  simply  use  pseudo  feedback  document  model  0Fi  in¬ 
stead  of  Oq  to  compute  the  divergence,  which  is  defined  be¬ 
low: 

QFBDiv_A  =  log 

To  prevent  zero  probability,  0F /  is  smoothed  using  the  col¬ 
lection  language  model  as  p(w\0Fi)  =  wllere 

fi  is  set  to  1500. 

We  call  this  divergence  “absolute  divergence”  in  contrast 
to  the  relative  divergence  to  be  defined  below. 

(2)  Relative  Divergence: 

With  the  above  absolute  divergence,  it  is  often  difficult  to 
say  that  a  large  divergence  value  means  a  bad  query,  because 
the  absolute  divergence  only  relies  on  0F  and  Oq  but  does 
not  take  other  useful  factors  into  consideration,  e.g.,  the  di¬ 
vergence  between  query  model  and  negative  feedback  model. 
In  fact,  if  the  divergence  between  query  and  negative  feed¬ 
back  is  much  larger  than  that  between  query  and  positive 
feedback  documents,  we  can  say  that  the  query  represents 
relevant  information  well,  no  matter  what  is  the  absolute 
divergence  value. 

To  address  this  problem,  we  propose  another  feature  to 
capture  a  relative  divergence.  Considering  a  scenario:  in  a 
searching  process,  if  document  D  is  judged  as  a  relevant  doc¬ 
ument  but  its  rank  in  the  result  document  list  is  very  low, 
it  also  shows  that  the  query  does  not  represent  the  feedback 
documents  well.  Hence,  intuitively,  the  rank  of  a  document 
also  measures  the  divergence  between  query  and  feedback 
documents,  and  such  a  measure  seems  more  comparable 
among  different  queries.  Because  there  are  sometimes  more 
than  one  feedback  documents,  we  adopt  an  average  rank  in 
our  predicting  system,  as  follows: 

QFBDiv-Rl  =  E  M 

deF  '  ' 

Where  is  the  rank  of  document  d,  e.g.,  the  rank  of  the 
first  document  is  1  and  the  second  one  is  2  ...;  IT1!  is  feedback 
length  as  described  before. 

In  the  formula  above,  a  large  QFBDii>-Rl  value  means 
a  low  rank.  Intuitively,  we  would  like  QFBDiv-Rl  to  pos¬ 
itively  contribute  to  the  measure  of  the  divergence  between 
query  and  feedback  documents,  which  simply  says  that  a 
higher  QFBDiv_Rl  implies  a  larger  divergence.  However, 
we  would  like  the  contribution  from  a  rank  measure  to  drop 
quickly  when  the  QFBDiv-R.l  is  low  and  become  nearly 
constant  as  it  becomes  higher.  The  rationale  of  this  heuris¬ 
tic  is  the  following:  a  low  rank  of  feedback  documents  (i.e., 
large  QFBDivJd  1)  often  implies  large  divergence  between 
query  and  feedback  documents,  thus  we  should  take  consid¬ 
eration  of  such  a  measure  when  computing  the  divergence; 
however,  when  QFBDiv-Rl  is  very  large  (i.e.,  the  feed¬ 
back  documents  are  ranked  very  low),  the  contribution  of 
rank  should  not  be  so  sensitive  to  the  difference  in  ranks  as 
when  it  is  small.  The  heuristic  suggests  a  concave  curve  for 
QFBDiv-Rl  and  the  query-feedback  divergence  as  shown 
in  Figure  1.  To  capture  such  a  heuristic,  we  propose  an¬ 
other  measure  by  taking  a  logarithm  transformation  on  the 


Average  Rank 


Figure  1:  Approximate  relation  between  the  query- 
feedback  divergence  and  the  average  rank. 


average  rank  to  approximate  the  divergence  div(Q,F),  as 
follows: 

QFBDiv-R2  s  log  E  i^r 

deF  '  ' 

3.2  Learning  Algorithm 

Based  on  our  heuristics  and  features,  we  hope  to  utilize 
some  learning  technique  to  obtain  equations  which  predict 
feedback  coefficients. 

We  propose  to  use  the  logistic  regression  model  [2] ,  which 
appears  to  model  our  problem  well:  it  can  take  any  value 
from  negative  infinity  to  positive  infinity  as  an  input,  whereas 
the  output  is  confined  to  values  between  0  and  1. 

Logistic  regression  models  are  also  called  maximum  en¬ 
tropy  models  in  some  communities.  In  particular,  logistic 
regression  models  are  of  the  form: 

/(,)  “  i  +  JP(-«) 

where  the  variable  2  represents  the  exposure  to  some  set  of 
features,  while  f(z)  represents  the  probability  of  a  particu¬ 
lar  outcome,  given  that  set  of  features.  The  variable  2  is  a 
measure  of  the  total  contribution  of  all  the  features  used  in 
the  model,  which  is  usually  defined  as  2  =  wx.  Specifically, 
x  is  a  vector  of  numeric  values  representing  the  features,  for 
instance,  our  features  might  include  query  length  \Q\,  the 
entropy  of  feedback  documents,  etc.  And  w  represents  a 
set  of  weights,  which  indicate  the  relative  weights  for  each 
feature.  A  positive  weight  means  that  the  corresponding  fea¬ 
ture  increases  the  probability  of  the  outcome,  while  a  neg¬ 
ative  weight  means  that  its  corresponding  feature  decreases 
the  probability  of  that  outcome;  a  large  weight  means  that 
the  feature  strongly  influences  the  probability  of  that  out¬ 
come;  while  a  near-zero  weight  means  that  the  feature  has 
little  influence  on  the  probability  of  that  outcome. 

Typically,  we  learn  these  weights  using  training  data.  For 
our  problem,  the  training  data  would  consist  of  feature  val¬ 
ues  along  with  the  corresponding  optimal  feedback  coeffi¬ 
cient.  To  construct  such  a  training  data  set,  we  exhaust  the 


feedback  coefficient  space  for  each  query-feedback  pair  to 
find  its  optimal  coefficient  (more  details  are  given  in  Section 
5),  where  each  query-feedback  pair  together  with  its  optimal 
coefficient  form  our  training  data.  Because  logistic  regres¬ 
sion  models  have  a  global  optimum,  the  choice  of  learning 
algorithm  is  usually  of  little  importance.  In  our  study,  we 
use  the  statistical  package  R  1  to  train  our  model. 

Once  the  weight  vector  w  of  the  equation  have  been  de¬ 
rived  for  a  particular  data  set  (training  data),  these  weights 
may  be  used  to  predict  feedback  coefficients  for  new  queries. 

4.  FEEDBACK  COEFFICIENT  SMOOTHING 

One  advantage  of  our  adaptive  feedback  algorithm  is  that 
we  can  naturally  incorporate  many  features  as  evidence  to 
improve  our  estimation  of  the  feedback  coefficient.  However, 
through  preliminary  experiments,  we  observe  that,  although 
a  fixed  coefficient  may  not  be  optimal  for  many  queries,  it 
provides  a  “safe”  coefficient  range;  compared  with  it,  our 
predicted  dynamic  value,  though  optimized  based  on  many 
features,  is  sometimes  too  extreme  and  thus  “risky”.  To 
address  this  problem,  we  experimented  with  some  strate¬ 
gies  to  smooth  our  prediction  using  the  safe  fixed  coefficient 
value.  We  hypothesize  that,  with  smoothing,  our  adaptive 
relevance  feedback  method  would  be  more  robust. 

Formally,  let  af  be  the  fixed  feedback  coefficient  and  ad 
be  the  dynamically  predicted  coefficient.  Our  task  here  is 
to  estimate  a  more  robust  feedback  coefficient,  which  we 
denote  by  ac,  obtained  by  smoothing  ad  using  a/.  We  now 
describe  several  different  smoothing  strategies. 

(1)  Linear  Interpolation:  Our  first  idea  is  to  linearly 
interpolate  the  two  feedback  coefficients  (i.e.,  aj  and  ad)  to 
obtain  the  final  coefficient  ac,  which  is  defined  below. 

ac  =  (1  -  (3)af  +  (3ad 

where  (3  £  [0, 1]  is  a  parameter  to  control  the  weight  on  each 
coefficient.  If  /3  =  0,  ac  simplifies  to  af,  If  (3  =  1,  ac  sim¬ 
plifies  to  ad',  otherwise  ac  is  between  a/  and  ad-  In  our 
experiments,  [3  is  experimentally  set  to  0.5. 

(2)  Range  Normalization:  Suppose  there  is  a  safe  range 
for  feedback  coefficient  [af  —  Si,  af  +<5r] ,  and  we  would  like  to 
restrict  ac  in  the  safe  range.  If  we  have  some  prior  knowledge 
about  Si  and  5r,  we  can  obtain  the  safe  range  easily.  How¬ 
ever,  most  of  the  time,  we  do  not  have  such  knowledge  and 
have  to  approximate  Si  and  8r.  In  this  work,  they  are  empir¬ 
ically  approximated  as:  Si  =  7 (a/  —  0)  and  8r  =  7(1  —  a/), 
where  we  use  a  7  to  indicate  the  breadth  of  safe  range.  Based 
on  these  assumptions,  we  propose  to  use  the  following  for¬ 
mula  to  obtain  the  final  feedback  coefficient. 

ac  =  25ad  +  af  —  S 

where,  if  ad  <  af,  then  S  =  Sr,  otherwise,  S  =  Sr.  And  it 
is  clear  that,  ac  is  restricted  to  [af  —  81,  af  +  5r].  In  our 
experiments,  7  is  experimentally  set  to  0.5. 

(3)  Pivoted  Interpolation:  Intuitively,  if  the  feedback 
coefficient  is  not  very  large,  the  performance  of  relevance 
feedback  will  be  at  least  as  good  as  that  of  the  original 
query;  however,  if  we  use  a  large  coefficient,  we  have  a  rel¬ 
ative  higher  possibility  to  hurt  the  retrieval  performance. 

To  strike  a  balance  between  exploration  and  exploitation, 
we  suggest  a  preservative  strategy  to  take  advantage  of  the 
predicted  coefficient  but  at  the  same  time  not  to  be  involved 

1  http://www.r-project.org/ 


A  |  B  |  C  |  D  |  E 

Retrieval  Model 

LM  +  Dirichlet  (fi  =  1500) 

FB  Model 

Relevance  Model  2 

FB  Term  Count 

- 

30  |  50 

Docs  per  Query 

2500 

Stopword 

No  319  common  words 

Table  1:  Parameters  in  Constructing  Workingset. 
The  row  “Docs  per  Query”  means  that  we  use  top 
2500  documents  of  each  query  to  construct  working 
sets 


Smoothing 

Mixture  Noise 

FB  Term  Count 

others 

p  =  1500 

0.9 

100 

default 

Table  2:  Parameters  in  Relevance  Feedback 


in  too  much  risk.  Specifically,  if  ad  <  af,  we  use  ad  as  the 
feedback  coefficient;  otherwise,  we  use  af.  Formally,  this 
Pivoted  Interpolation  is  defined  as  follows. 

ac  =  (1  —  (3)af  +  (3ad 
where  if  ad  <  af,  (3  =  1;  otherwise  (3  =  0. 

5.  EXPERIMENT  RESULTS 
5.1  Data  Preprocessing 

We  employ  the  Lemur  toolkit  (version  4.5)  and  Indri  search 
engine  (version  2.5)  2  in  our  experiments.  Below,  we  de¬ 
scribe  how  we  pre-process  the  data  collection  and  how  to 
prepare  the  training  data. 

First,  due  to  the  large  size  of  the  GOV2  data  collection, 
we  decide  to  do  the  retrieval  experiments  on  a  subset  of  the 
collection,  instead  of  the  whole  426G  data.  To  make  the 
retrieval  experiments  on  our  working  sets  equally  to  that  on 
the  whole  data  set,  we  use  Indri  to  construct  5  working  sets, 
each  for  one  task.  Specifically,  we  first  build  an  index  on  the 
whole  Gov2  collection,  then  we  retrieve  result  documents  for 
each  query  (for  task  B-E,  we  do  relevance  feedback  based  on 
the  corresponding  positive  judgments),  and  finally  these  re¬ 
sult  documents  are  extracted  to  construct  our  working  sets. 
The  related  parameters  we  adopt  to  construct  working  sets 
is  shown  in  Table  1,  other  parameters  are  the  same  as  Indri’s 
default  setting. 

Furthermore,  we  pre-compute  a  global  collection  model 
9c  using  the  whole  data  set,  and  in  the  following  experi¬ 
ments,  whenever  we  need  to  access  a  local  collection  lan¬ 
guage  model  (i.e.,  language  model  of  a  specified  working 
set)  to  smooth  document  models,  we  just  use  6c  instead. 
To  very  if  our  working  sets  work  appropriately,  we  try  some 
retrieval  experiments  on  them  and  observe  that  the  retrieval 
performance  is  almost  the  same  to  that  on  the  whole  data 
set. 

After  constructing  working  sets,  we  build  a  separate  index 
for  each  working  set.  Throughout  this  paper,  when  building 
any  index,  we  only  stem  words  using  the  Porter  algorithm, 
without  any  other  preprocessing. 

To  train  our  adaptive  relevance  feedback  model,  we  need 
some  training  data.  In  our  study,  we  use  the  Terabyte  topics 
(701-850,  excluding  those  included  in  this  year’s  test  set)  as 
the  training  queries.  There  are  100  topics  in  total,  of  which 

2  http :  /  /  www.  lemurproj  ect .  org/ 
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Figure  2:  The  sensitivity  to  feedback  coefficient  of 
some  Terabyte  query  topics. 


one  topic  has  no  relevant  documents  and  another  fails  to  re¬ 
turn  any  relevant  documents.  So,  finally,  we  adopt  98  topics 
as  training  data.  Then,  for  each  query,  we  randomly  se¬ 
lect  some  top  relevant  documents  to  simulate  “judgments”. 
Specifically,  for  each  query,  with  a  probability  of  0.3  we  will 
select  only  1  relevant  document  for  feedback,  and  with  prob¬ 
abilities  of  0.4,  0.1,  0.1  and  0.1  we  will  select  3,  4,  5  and  6 
relevant  documents  respectively.  The  distribution  over  these 
different  numbers  of  relevant  documents  is  heuristically  fixed 
to  {0.3,  0.4, 0.1,  0.1,  0.1}  to  approximate  the  numbers  of  rel¬ 
evant  documents  used  for  feedback  in  the  official  tasks  B,  C, 
and  D. 

After  that,  we  also  construct  a  working  set  for  training 
data  using  the  same  parameter  setting  as  task  C.  Finally, 
with  Lemur  toolkit,  we  adopt  the  KL-Divergence  retrieval 
model  with  mixture  model  feedback  to  do  relevance  feed¬ 
back  experiments  (related  parameters  are  shown  in  Table 
2);  through  trying  different  feedback  coefficients  (0.0,  0.1, 
...,  1.0),  we  get  the  optimal  coefficient  value  for  each  query. 
The  4th  column  of  Table  3  gives  some  examples  of  the  opti¬ 
mal  coefficients.  We  use  this  data  set  to  learn  the  prediction 
model. 

5.2  Sensitivity  of  Feedback  Coefficient 

As  we  have  discussed  in  Section  2,  relevance  feedback  is 
controlled  by  a  coefficient  a.  When  a  =  0,  we  are  only  using 
the  original  query  model  (i.e.,  no  feedback),  while  if  a  —  1, 
we  ignore  completely  the  original  query  and  rely  only  on  the 
feedback  model.  To  show  the  sensitivity  of  a,  we  plot  the 
MAP  of  some  queries  (Terabyte  topics  757,  776,  and  793) 
in  relevance  feedback  experiments  by  varying  a  from  0  to  1 . 
The  results  are  shown  in  Figure  2.  We  can  observe  that  the 
setting  of  feedback  coefficient  a  can  affect  the  performance 
significantly,  and  the  optimal  coefficients  for  different  queries 
could  be  quite  different. 

5.3  Preliminary  Experiment 

We  find  that  Relative  Query  Entropy  QEnt_Rl ,  Relative 
Entropy  of  Feedback  Document  FBEnt-R ,  Absolute  Diver¬ 
gence  QFBDiv-A,  and  Relative  Divergence  QFBDiv-R'2 
have  the  most  significant  influence  on  the  ability  to  predict 
accurate  feedback  coefficients.  Thus,  we  keep  the  above  four 
features  in  our  final  prediction  model,  and  the  following  co¬ 


Adaptive 

Fixed 

Optimal 

785 

0.3525 

0.6 

0.4 

787 

0.4629 

0.6 

0.2 

789 

0.4654 

0.6 

0.7 

791 

0.6799 

0.6 

0.3 

793 

0.6006 

0.6 

0.0 

795 

0.8717 

0.6 

0.9 

797 

0.4897 

0.6 

0.4 

799 

0.3533 

0.6 

0.5 

801 

0.5830 

0.6 

0.8 

805 

0.7812 

0.6 

1.0 

807 

0.5145 

0.6 

0.5 

809 

0.4262 

0.6 

0.5 

810 

0.7063 

0.6 

0.8 

811 

0.5002 

0.6 

0.5 

813 

0.5616 

0.6 

0.5 

815 

0.6266 

0.6 

0.8 

816 

0.4972 

0.6 

0.1 

817 

0.3873 

0.6 

0.7 

819 

0.4527 

0.6 

0.2 

821 

0.4929 

0.6 

0.5 

Table  3:  Samples  of  Prediction  Values 

efficients  are  then  derived  from  our  training  algorithm. 

z0  =  -  0.93265  +  0.09890  *  QEnt_Rl 

-  1.45937  *  FBEnt-R  +  0.28350  *  QFBDiv-A 

+  0.32427  *  QFBDiv_R2 

Where  we  use  the  absolute  value  of  each  feature.  Then,  we 
can  predict  feedback  directly  as  below: 

“  -  /(*”)  "  l  +  .*p(— a,) 

From  the  above  formulas,  we  can  see  clearly  that  QFBDiv-A 
and  QFBDiv-R2  play  the  similar  roles  as  we  discussed  in 
Section  3.  However,  Relative  Query  Entropy  QEnt-Rl  in¬ 
creases  the  feedback  coefficient,  which  means  that,  if  a  query 
is  more  discriminative,  we  can  use  a  higher  feedback  coeffi¬ 
cient.  It  is  in  contrast  to  our  initial  expectation.  One  expla¬ 
nation  is  that,  a  more  discriminative  query  (i.e.,  with  a  high 
clarity  score)  is  more  drifting-tolerant,  and  thus  it  is  safe  to 
use  a  large  feedback  coefficient  in  this  case.  Also  FBEnt-R 
is  negatively  correlated  to  the  feedback  coefficient  a,  and 
it  is  also  in  contrast  to  our  intuition.  One  possible  expla¬ 
nation  is  that,  we  do  not  need  a  large  feedback  coefficient 
if  the  feedback  is  too  discriminative,  since  a  discriminative 
feedback  can  easily  drift  the  original  query  away. 

With  the  prediction  formula,  we  can  compute  potentially 
different  feedback  coefficients  for  different  queries.  Note 
that,  the  four  features  are  all  computed  efficiently,  because 
we  only  use  the  Maximum  Likelihood  method  to  estimate 
related  language  models. 

We  evaluate  our  method  on  the  training  data  by  using 
10-fold  cross  validation.  Some  examples  of  our  prediction 
values  are  shown  in  the  second  column  of  Table  3. 

We  compare  our  adaptive  feedback  method  with  our  base¬ 
line  system,  i.e.,  the  fixed  feedback  coefficient  approach  (where 
we  use  a  fixed  coefficient  0.6,  since  it  brings  the  best  per¬ 
formance).  Before  evaluation,  the  judged  documents  are 
removed  from  the  results,  and  the  result  document  list  is 
then  restricted  to  only  contain  the  top  1000  documents.  Be¬ 
sides,  we  also  compute  the  Mean  Absolute  Error  (MAEr- 
ror)  to  indicate  how  far  off  the  coefficients  used  in  the  two 


MAError 

MAP 

Recall 

Fixed 

0.2173 

0.3243 

12070/18649 

Adaptive 

0.1824 

0.3341* 

12365/18649 

Improvement 

16.1% 

3.0% 

2.4% 

upper-bound 

0 

0.3553 

12696/18649 

Table  4:  Performance  Comparison  on  Training  Data 


MAError 

MAP 

Recall 

All  Features 

0.1824 

0.3341 

12365/18649 

No  FBEnt-R 

0.1879 

0.3320 

12308/18649 

No  QEnt-Rl 

0.1905 

0.3294 

12295/18649 

No  QFBDiv-R2 

0.1950 

0.3307 

12409/18649 

No  QFBDiv-A 

0.1842 

0.3330 

12316/18649 

Table  5:  Contributions  of  Features  on  Training 
Data.  “No  XXX”  means  removing  feature  XXX. 

methods  and  the  optimal  coefficients.  The  comparison  of 
performances  is  shown  in  Table  4,  which  indicates  that  our 
approach  outperforms  the  fixed  coefficient  approach  clearly. 

Next,  we  perform  several  experiments  to  show  the  con¬ 
tributions  of  individual  features.  Table  5  shows  the  re¬ 
sults,  where  every  time  one  feature  is  removed  singly.  Below 
we  give  the  derived  formula  to  predict  coefficient  without 
QEntJRl  as  an  example.  From  Table  5,  we  can  see  that 
each  feature  plays  an  important  role. 

zi=  +  0.01156  -  0.77456  *  FBEntJl 
+  0.20068  *  QFBDiv-A 
+  0.35399  *  QFBDivJl2 

Also,  we  design  some  experiments  to  evaluate  the  pro¬ 
posed  smoothing  methods  (Section  4).  The  results  are  shown 
in  Table  6.  It  indicates  that  Range  Normalization  (Norm)  > 
Linear  Interpolation  (Linear)  >  Pivot  Interpolation  (Pivot), 
however,  no  smoothing  method  outperforms  our  basic  adap¬ 
tive  feedback  method.  Maybe  it  is  because  we  did  not  tune 
the  parameters  of  these  smoothing  methods,  which  will  be 
further  studied  in  the  future  work. 

5.4  Official  Run  Results 

We  submitted  two  runs  for  each  task,  in  which  various 
techniques  we  designed  are  applied.  These  runs  are  de¬ 
scribed  in  Table  7.  UIUC.B1,  UIUC.C1,  UIUC.D1  and 
UIUC.E1  use  the  proposed  adaptive  relevance  feedback  plus 
the  range  normalization  smoothing  method;  UIUC.B2  and 
UIUC.C2  also  adopt  our  adaptive  feedback  but  respectively 
use  pivot  and  linear  interpolation  as  the  smoothing  method; 
UIUC.D2  only  tests  the  adaptive  feedback  without  any  smooth¬ 
ing  method;  in  UIUC.E2,  we  add  a  pseudo  relevance  feed¬ 
back  on  top  of  the  adaptive  relevance  feedback.  Table  8 
shows  the  performance  of  these  official  runs. 

After  our  official  runs  were  submitted  we  discovered  that 


NoSmooth 

Linear 

Norm 

Pivot 

All  Features 

0.3341 

0.3316 

0.3323 

0.3265 

No  FBEnt_R 

0.3320 

0.3301 

0.3308 

0.3263 

No  QEnt_Rl 

0.3294 

0.3295 

0.3304 

0.3244 

No  QFBDiv_R2 

0.3307 

0.3294 

0.3299 

0.3254 

No  QFBDiv_A 

0.3330 

0.3308 

0.3310 

0.3262 

Table  6:  Comparison  of  Smoothing  Methods  on 
Training  Data 


RunID 

Description 

UIUC.A1 

No  Relevance  Feedback 

UIUC.B1 

Adaptive  +  Norm 

UIUC.B2 

Adaptive  +  Pivot 

UIUC.C1 

Adaptive  +  Norm 

UIUC.C2 

Adaptive  +  Linear 

UIUC.D1 

Adaptive  +  Norm 

UIUC.D2 

Adaptive 

UIUC.E1 

Adaptive  +  Norm 

UIUC.E2 

Adaptive  +  PseudoFB 

Table  7:  Description  of  Runs 


RunID 

MAP 

MTC 

StatAP 

UIUC.Al 

0.1240 

0.0460 

0.2127 

UIUC.B1 

0.1868 

0.0650 

0.2886 

UIUC.B2 

0.1770 

0.0641 

0.2833 

UIUC.C1 

0.1971 

0.0714 

0.3192 

UIUC.C2 

0.1971 

0.0714 

0.3192 

UIUC.D1 

0.2078 

0.0722 

0.3397 

UIUC.D2 

0.2079 

0.0712 

0.3327 

UIUC.E1 

0.2118 

0.0673 

0.3284 

UIUC.E2 

0.1744 

0.0583 

0.2681 

Table  8:  Official  Results  of  Runs 


our  implementation  of  the  range  normalization  was  not  quite 
accurate  and  we  had  left  out  the  query  feedback  documents 
divergence  feature  QFBDiv-A.  So,  we  decided  to  re-compute 
our  runs.  Note  that  we  did  not  change  anything  related  to 
our  algorithm  but  just  the  implementation.  We  also  gener¬ 
ated  some  additional  runs  to  compare  different  techniques 
we  proposed.  Table  9  and  10  show  the  performance  of  these 
runs. 

Comparing  to  our  preliminary  experimental  results,  there 
are  two  significant  changes:  feature  QFBDiv_R2  hurts  the 
performance;  our  smoothing  methods,  especially  the  Range 
Normalization,  improve  the  performance.  From  Table  9, 
we  can  see  that  “No  QFBDiv_R2”  outperforms  “All  Fea¬ 
tures”  all  the  time,  and  that  “No  QFBDiv-R2”  always  beats 
the  baseline  system;  with  Range  Normalization,  the  perfor¬ 
mances  of  all  methods  are  improved. 

One  possible  explanation  of  the  two  significant  changes  is 
that,  the  training  data  and  the  testing  data  are  quite  incon¬ 
sistent.  In  our  training  data,  we  use  top  relevant  documents 
to  simulate  judged  documents,  because  in  real  world,  users 
would  like  to  judge  top  documents;  however,  in  the  test¬ 
ing  data,  the  judged  documents  are  often  ranked  very  low 
(or  even  do  not  occur  in  the  top  2500  result  documents). 
Our  feature  QFBDiv-R2  measures  the  rank  of  feedback  doc¬ 
uments  and  thus  is  very  sensitive  across  two  data  sets.  How¬ 
ever,  on  the  other  hand,  it  also  shows  that  our  prediction 


Task  B 

NoSmooth 

Linear 

Norm 

Pivot 

Baseline 

0.1889 

- 

- 

UIUC.Bl 

- 

- 

0.1868 

UIUC.B2 

- 

- 

- 

0.1770 

All  Features 

0.1781 

0.1870 

0.1892 

0.1771 

No  FBEnt_R 

0.1719 

0.1837 

0.1874 

0.1730 

No  QEnt.Rl 

0.1718 

0.1820 

0.1847 

0.1724 

No  QFBDiv_R2 

0.1892 

0.1915 

0.1948 

0.1826 

No  QFBDiv_A 

0.1760 

0.1862 

0.1891 

0.1753 

Table  9:  Performance  Comparison  on  Task-B 


|  Task  C  | 

NoSmooth 

Linear 

Norm 

Pivot 

Baseline 

0.1948 

- 

- 

UIUC.Cl 

- 

- 

0.1971 

UIUC.C2 

- 

0.1971 

- 

All  Features 

0.1919 

0.1955 

0.1973 

0.1904 

No  QFBDiv_R2 

0.1968 

0.1976 

0.1994 

0.1893 

[  Task  D  I 

Baseline 

0.2057 

- 

- 

- 

UIUC.Dl 

- 

- 

0.2078 

- 

UIUC.D2 

0.2079 

All  Features 

0.2071 

0.2090 

0.2100 

0.2039 

No  QFBDiv_R2 

0.2075 

0.2073 

0.2118 

0.2006 

1  Task  E  | 

Baseline 

0.2182 

- 

- 

UIUC.El 

- 

0.2118 

All  Features 

0.1797 

0.2123 

0.2110 

0.2182 

No  QFBDiv_R2 

0.2192 

0.2193 

0.2184 

0.2182 

Table  10:  Performance  on  Task  C,  D  and  E 


model  without  QFBDiv_R2  is  very  robust,  which  even  works 
well  on  such  a  different  data  set;  although  the  parameter  of 
our  Range  Normalization  smoothing  method  is  not  tuned, 
it  still  helps  a  lot  when  testing  and  training  data  are  not 
consistent. 

Another  reason  to  explain  the  changes  may  be  the  sparse¬ 
ness  of  our  training  data.  We  only  utilize  98  training  queries 
to  train  4  features,  which  have  already  led  to  a  robust  rel¬ 
evance  feedback  method.  It  would  be  interesting  to  see 
whether  a  large  number  of  queries  would  lead  to  a  more 
effective  logistic  regression  approach  to  predict  feedback  co¬ 
efficients. 

6.  CONCLUSIONS 

In  summary,  we  studied  a  novel  problem  in  feedback,  i.e. , 
optimization  of  the  balance  of  the  query  and  feedback  infor¬ 
mation,  in  this  year’s  relevance  feedback  task,  and  proposed 
an  adaptive  relevance  feedback  approach  to  dynamically  pre¬ 
dict  feedback  coefficient.  Our  experiment  results  show  that 
the  our  proposed  method  is  robust  and  effective,  which  out¬ 
performs  the  fixed-coefficient  relevance  feedback,  even  when 
training  and  testing  data  sets  are  not  consistent. 

Besides,  we  also  designed  three  smoothing  strategies  to 
smooth  our  predicted  coefficients  to  make  them  more  robust. 
Among  the  three  smoothing  methods,  Range  Normalization 
is  the  most  effective  one;  smoothing  our  prediction  value  can 
make  it  more  robust,  especially  when  training  and  testing 
data  sets  are  inconsistent. 

Among  our  features,  we  find  that  Relative  Query  En¬ 
tropy  QEnt_Rl ,  Relative  Entropy  of  Feedback  Document 
FBEnt-R,  Absolute  Divergence  QFBDiv-A,  and  Relative 
Divergence  QFBDiv_R2  are  the  most  effective  ones.  How¬ 
ever,  QFBDiv-R2  is  very  sensitive  to  the  data  set  and 
should  be  used  carefully. 

There  is  still  much  room  to  explore  in  the  future  work.  We 
should  study  more  effective  and  robust  features  in  the  future. 
And  also,  we  hope  to  apply  our  method  to  other  feedback 
models,  e.g.  Rocchio  Feedback,  to  show  its  performance.  In 
addition,  it  will  be  interesting  to  explore  how  to  adaptively 
predict  feedback  coefficients  in  pseudo  and  implicit  relevance 
feedback  models. 


This  material  is  based  upon  work  supported  by  the  Na¬ 
tional  Science  Foundation  under  Grant  Numbers  IIS-0347933, 
IIS-0713581,  and  FIBR-0425852. 
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